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Abstract 

Future detectors for high luminosity particle identification and ultra high energy neutrino observation would 
benefit from a digitizer capable of recording sensor signals with high analog bandwidth and large record depth, in 
a cost-effective, compact and low-power way. 

A first version of the Buffered Large Analog Bandwidth (BLAB1) ASIC has been designed based upon the lessons 
learned from the development of the Large Analog Bandwidth Recorder and Digitizer with Ordered Readout 
(LABRADOR) ASIC. While this LABRADOR ASIC has been very successful and forms the basis of a generation 
of new, large-scale radio neutrino detectors, its limited sampling depth is a major drawback. A prototype has been 
designed and fabricated with 64k deep sampling at multi-GSa/s operation. We present test results and directions 
for future evolution of this sampling technique. 
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1. Introduction 

Observation of the early universe through neu- 
trino messengers of the highest possible energies 
requires a detector of enormous instrumented vol- 
ume [1]. At the same time, lepton flavor identi- 
fication of such radio detection events represents 
a completely unique tool for the study of cosmo- 
logical evolution of the universe. Particle interac- 
tions at extreme energies provide a probe capable 
of illuminating the completely unknown accelera- 
tion mechanisms of the highest energy cosmic ray 
events [2]. 
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Particle identification is also crucial to the 
physics program of a next generation "Super" 
B Factory. Such an accelerator will produce B 
mesons in sufficiently copious quantities to permit 
detailed scrutiny of standard model predictions in 
the flavor sector [3]. Any new theories for physics 
beyond the standard model must leave fingerprints 
that can be detected via flavor transformation 
of particles in the final state. Therefore, particle 
identification is essential and the detector and 
readout electronics must survive the very high 
signal occupancies expected [4] . 

We present results from a deep-sampling ASIC 
that meets these requirements, based upon exten- 
sion of the successful LABRADOR ASIC [5]. 
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2. Architectural Details 

The BLAB1 ASIC is a single channel, multi- 
GSa/s waveform sampler with a record depth of 
2 16 analog storage samples. The BLAB1 analog in- 
put is AC coupled with an external capacitor and 
50 Q terminated with an on-chip terminator, as 
should be expected for a high-performance RF de- 
vice. After the on-chip terminator, an analog buffer 
tree fans out copies of the signal to the matrix of 
128 rows of 512 samples composing the 64k array. 
Each of the rows may be independently addressed 
to initiate a storage cycle. Within each Switched 
Capacitor Array (SCA) storage cell is a capacitor 
and a comparator. A block diagram of the BLAB1 
readout is shown in Fig. 1. 
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Fig. 1. A block diagram of the BLAB1 readout, where for 
compactness the comparators are located inside the BLAB1 
device, while the high-speed time encoding is done inside 
a companion FPGA. 

When an analog switch is pulsed closed, the in- 
stantaneous input signal is stored on a 14 fF capac- 
itor. The charge is then held until either overwrit- 
ten or discharged due to leakage current. Each sam- 
pling capacitor is connected to the negative input 
of a comparator. The positive input of each com- 
parator is connected to a common voltage ramp. 
A wire-bonded BLAB1 die photograph is shown 
in Fig. 2, with this storage array contained within 
about 5.25 square mm of the die shown. 



Fig. 2. A photograph of the wire-bonded BLAB1 ASIC. 
The die is 3 mm by 2.8 mm and was fabricated in the 
TSMC 0.25/xm process. 

Conversion of these stored samples is via a 
Wilkinson ADC method, where the stored voltage 
is converted into a transition time of the in-cell 
comparator due to an applied voltage ramp. This 
ramp is generated with a current mirror and can be 
adjusted both by varying the ramping current, as 
well as an external capacitor. The typical ramping 
current range is 10-100/iA and the ramp capacitor 
size is a few hundred pF. Encoding is performed 
by measuring the time interval between the ramp 
start and the comparator output transition. In 
a simple form of time-to-digital conversion, this 
interval is measured by counting the number of 
high-speed clock cycles taken. In the predecessor 
ASIC [5] , the Gray code counter was implemented 
on-chip, whereas in BLAB1 it is implemented 
inside a companion programmable logic device, 
in this case a Field Programmable Gate Array 
(FPGA). When the voltage ramp is started, a 
Gray code counter in the FPGA is enabled coin- 
cident to a high speed clock (500MHz) and the 
comparator output is used to latch the counter 
value. By knowing the ramping voltage slope and 
the high speed clock frequency, the latched counter 
value can be converted into voltage. A group of 32 
comparators are selected, as illustrated in Fig. 3, 
and are read out during each ramping cycle. 
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Fig. 3. Schematic of the BLAB1 sampling array. 

By addressing a row and selecting a group of 
32 columns for each conversion cycle, the window 
of interest inside the ASIC is read out. Impor- 
tantly, this readout operation can be done while 
sampling continues, providing continuous pipelin- 
ing and subsequent deadtime reduction. This de- 
cision to move the high-speed clock and registers 
off-chip also means that the size of each storage cell 
can be significantly reduced. A schematic of the 
base BLAB1 storage cell is shown in Fig. 4, where 
the comparator is simply a differential NMOS pair. 
The corresponding layout is shown in Fig. 5, where 
the overall dimensions are 40 A by 139A, where A = 
0.12/im. This corresponds to 4.8/im by 16.68/im, 
or about 80/im 2 required per storage cell. 
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Fig. 4. Schematic of the BLAB1 basic storage cell. 

Therefore the core of the sampling array requires 
only 5.25mm 2 of chip area, permitting more than 




Fig. 5. Layout of a single SCA storage cell, where the units 
displayed are in units of A, which is 0.12/im. 

an order of magnitude improvement in storage 
density compared with existing devices [5,6,7,8]. 
Reducing the cell size and subsequently the stor- 
age capacitance also helps improve the bandwidth 
that can be coupled into each storage cell. Since 
the "on" resistance of the switch is relatively high 
(,R on ~ 5k£l), frequencies above 
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(i) 



will roll off for a given pixel capacitance C p i x . 
The extracted capacitance value for the layout in 
Fig. 5 is approximately 14 fF. Therefore the ex- 
pected /3dB from the common input bus line into 
each storage cell is approximately 2.3 GHz. 

We note that the size of the storage cell can be 
reduced further by removing individual sample de- 
lay timing chains from each storage row. As seen 
in the bottom of Fig. 5, this inverter pair is more 
than half the area of the storage cell. For power 
dissipation reasons, this removal turns out to be 
important, as will be discussed later. 
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A further benefit of decoupling the latching reg- 
ister and clocking functionality is that the conver- 
sion clock can be run at a much higher speed inside 
the FPGA, since it is routinely fabricated in either 
a 65nm or 90nm process, compared with the rela- 
tively coarse 250nm (0.25/im) process of BLAB1. 
Typically with the chosen Xilinx Virtex family em- 
ployed we are able to use a 500MHz clock, and 
record the phase of the clock as well, thereby ef- 
fectively having a Ins least count. Separate testing 
indicates that this TDC performs very close to the 
ideal binary interpolation ^= limit (~ 300ps), as 
reported previously [9]. Moreover, the number of 
bits of resolution or precision can be completely 
configurable, which permits a trade-off of the read- 
out latency versus required sample resolution for 
various applications. We note in passing that there 
is a potentially much better method based upon 
applying this same waveform sampling technique 
to the timing encoding of the comparator output. 
The TDC least count would then become Ins — > 
170ps, and fitting the output shape, better than 
binary encoding time resolution may be possible. 

While the coupling into individual storage cells 
can support high analog bandwidth (> 2GHz), the 
cumulative capacitance seen when trying to drive 
the array of 64k cells is very problematic. The ex- 
tracted capacitance of each of the 2 16 switch drains 
is about 1.5fF, which sums to a total array capaci- 
tance C array of 98pF. Clearly, for a reasonably low 
input coupling impedance of Z- in = 50^, this band- 
width limitation to 

/ 3 dB = ^ = 32.5 MHz (2) 

^ 7T Zj i n O array 

would be completely unacceptable. Therefore a 
3-level buffer tree has been employed, to reduce 
the loading seen at each stage of signal fan-out. 
The unity gain for zero capacitance of these buffer 
amplifiers is in excess of 1GHz. In retrospect, the 
choice of fanout: 1 — > 16 — > 128 was not optimal, 
as the capacitance of the intermediate state was 
rather high and limits the performance, as will be 
shown in the testing section. 

The sampling speed is controlled by adjusting 
the VDD/VSS supply voltages of one of the two 
inverter-inverter delay stages between each adja- 
cent sampling cell in a particular sampling row. 



As mentioned early, by addressing a row and 
pulsing the first cell of that particular row, a 
write strobe then propagates along the row until 
it reaches the last cell in the row. The leading 
edge of the pulse closes the switch and the trailing 
edge opens the switch, at which point the analog 
voltage value is stored. 

Upon the determination of an external trigger 
condition, further sampling to the row(s) or inter- 
est are blocked in firmware and a ramping volt- 
age is generated by using a constant current source 
and reference capacitor, as mentioned earlier. The 
ramping voltage for the BLAB1 can be generated 
using either an external capacitor or an on-chip ca- 
pacitor. A external capacitor is necessary for slower 
ramping speeds. The current source is set by an 
external resistor. A unique feature of the BLABl's 
digitization technique is that the ADC resolution 
does not have a default value. For a fixed clock fre- 
quency, reducing the ramping voltage speed will in- 
crease ADC resolution. However, by using a slower 
ramp, it will take longer to digitize. 

BLAB1 was designed to be a low power ADC. 
Three voltage sources are required to operate the 
BLAB1. A voltage source of 2.5 volts is the main 
power source. An adjustable VDD source is used 
to control the sampling speed. A pedestal voltage, 
typically 1.3 volts, is used to set the DC offset of 
the RF input. When in quiescent mode, the power 
draw can be lOmW or less. A list of the key BLAB1 
specifications are summarized in Table 1. 

Table 1 



Important BLAB1 ASIC Specifications. 



Item 


Value 


Sampling Input Channels 


1 


Storage rows 


128 


Storage cells/row 


512 


Total storage cells 


65,536 


Sampling speed (GSa/s) 


0.1 - 6.0 


Storage record 


10.9 - 655 ijls 


Wilkinson Outputs 


32 


Operation mode 


continuous storage/readout 


100ns window readout 


80/is (5.12 GSa/s, 12-bits) 


Full chip readout 


~10ms (12-bits) 
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3. Readout Test System 

A series of printed circuit boards have been fab- 
ricated to evaluate various aspects of BLAB1 per- 
formance. Beyond this, these evaluation devices 
are proving useful for instrumenting a next genera- 
tion of Cherenkov radiation detectors [10]. A pho- 
tograph of a 2 BLAB1 ASIC (precision differential 
timing evalution) circuit board is shown in Fig. 6. 

The three main components on this circuit board 
are two BLAB1 chips, an FPGA (largest pack- 
age in center), and a Universal Serial Bus (USB) 
interface. The external communication protocol 
is USB 2.0. A USB microcontroller, the Cypress 
CY7C68013-56PVC, located on the circuit board 
interprets the USB 2.0 protocol and controls the 
flow of data being sent and received from the 
FPGA to a computer interface. The FPGA used 
is a Xilinx XC3S400 and controls the digital logic 
and timing for the BLAB1 readout. An internal 
FPGA RAM buffers the data while the data is be- 
ing dumped into the USB data stream. A custom 
readout and control software utility was developed 
using the wx Widgets tool kit [11], a screen shot of 
which is shown in Fig. 7. 

In this configuration, it becomes apparent that 
this BLAB1 "oscilloscope on a chip" can, with this 
small readout board, turn any PC (or laptop) into 
a high-performance digital signal oscilloscope. This 
software package sends commands to the FPGA 
and records the BLAB1 data via the USB 2.0 inter- 
face. Running this utility on a standard PC, a sus- 
tained triggered event rate of approximately 7kHz 
(single row readout) has been demonstrated. This 
rate should not be considered a hard limit as nei- 
ther the software nor the firmware was optimized 
for speed. The sampling rate is controlled by set- 
ting a DAC, which then adjusts the VDD voltage 
(ROVDD) of the on-chip voltage-controlled delays. 



4. Basic Sampler Performance 

Employing the test system described in the pre- 
vious section and its variants, a number of the basic 
performance parameters of the BLAB1 have been 
evaluated. Because timing performance is such a 



BLAB1 ASICs 
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Fig. 6. Photograph of the BLAB1 differential timing per- 
formance evaluation circuit board, with key components 
indicated. 

critical feature of this sampling device, it is de- 
scribed in detail in a subsequent section. 



4.1. Sampling speed 

Determination of the sampling speed is made by 
measuring the time interval between insertion of 
the timing strobe and appearance of the output 
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Fig. 7. Screen capture of the acquistion/control program. 



not be significant, and can potentially be cali- 
brated out with an external reference clock [5] , the 
delay can also be monitored and stabilized using a 
firmware control loop. 



184 — Sample aperture (172ps = 5.8GSa/s) 



0.2%/degree C 




y = 0.4378x + 160.84 



R = 0.9991 



Matches SPICE simulation 



35 40 45 

temperature (degrees C) 



pulse from the last cell of the row, minus pad buffer 
delays. The sampling speed is calculated by taking 
the number of cells in a row and dividing it by 
the propagation time for a given control voltage 
setting. A plot of the sampling speed versus control 
voltage (ROVDD) is shown in Fig. 8, where it is 
seen that sampling rates from below 1.0 GSa/s to 
above 6.0 GSa/s are possible. 

ROVDD versus ADC sampling speed 



ROVDD (volts) 

Fig. 8. Sampling rate as a function of the ROVDD control 
voltage, where extended operation (> 2.5V) is possible. 

One potential disadvantage of this voltage con- 
trolled delay technique is that the circuit is tem- 
perature dependent. This dependence is seen in 
Fig. 9 and is roughly 0.2%/° C, and completely 
matches expectation from SPICE simulation. 
While for many applications this variation would 



Fig. 9. Temperature dependence of the sampling rate. 

4.2. Noise performance 

Noise distributions were measured for all stor- 
age cells in the process of determining the pedestal 
values. These measurements are made by termi- 
nating the BLAB1 analog input, and reading each 
cell multiple times. An example of the the noise 
distribution for a typical storage cell is shown in 
Fig. 10, which represents the ensemble mean noise 
average of about 1 mV RMS. With an input dy- 
namic range of greater than 1 Volt (1.5V nom.) 
and this average noise level, each stored sample 
represents 10 real ADC bits of resolution, which 
is very competetive with commercially available, 
large power-dissipation ADCs [12]. 

For comparison, the expected RMS noise due to 
the small charge quantization 



kT 



^RMS 



(3) 



where k is Boltzmann's constant and we take T 
to be 300K. Plugging in the C p i x from above, we 
expect a contribution due to this "kTC" noise of 
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Fig. 10. A representative storage cell noise distribution, 
where a Gaussian fit yields a noise level of about 1 mV 
RMS. 

which subtracted in quadrature indicates that 
the excess ASIC and board level noise is approx- 
imately 0.84mV, and could perhaps be improved 
through better layout. 

4.3. Analog bandwidth 

A determination of the analog frequency re- 
sponse of the BLAB1 ASIC was performed by 
recording fixed amplitude sine waves of varying 
frequencies and comparing the ratio of the actual 
amplitude to the recorded amplitude. The ampli- 
tude roll-off versus frequency is shown in Fig. 11, 
where the -3 dB attenuation point is about 300 
MHz, and the -lOdB point extends beyond 600 
MHz. 

To illustrate how this performance corresponds 
into the ability to sample an RF sine wave, consider 
the uncalibrated waveform of Fig. 12. As the noise 
is small, deviations from a smooth curve give an 
indication of the level of calibration required in the 
following precision timing section. 

4.4. Leakage current 

Because leakage current is a concern for long 
storage times, and the array contains a large num- 
ber of samples, which potentially take a long pe- 
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Fig. 12. Reference 125MHz sine wave recorded with the 
BLAB1. 

riod to read out completely, this issue was studied 
extensively. A measurement of the leakage current 
for all 64k sampling capacitors was performed. This 
measurement was done by terminating the BLAB1 
analog input and reading out each cell repeatedly, 
without a write update, for 20 seconds. A sum- 
mary histogram of the leakage current determined 
for all storage cells from a fit to each leakage cur- 
rent slope is plotted in Fig. 13. 

For reference, these values are in quite good 
agreement with leakage currents measured pre- 
viously by our group for a similar TSMC CMOS 
process in different fabrication runs [13,14]. If the 
effect of this leakage current is to be reduced to 
a level comparable with the noise, the following 
condition must be met: 
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Fig. 13. Leakage current histogram for all cells of a given 
BLAB1 device, where the mean leakage current is slightly 
under 3fA. 



lmV = AV 



AQ Ji eak • AT 



(5) 



where AT is the maximum storage to readout 
interval and C p i x is the pixel storage capacitance. 
Using a leakage current of ii ea k = 25fA, which is 
conservatively larger than almost all storage cells, 
the maximum readout latency is thus 



AT 



lmV • 14fF 

25fC/s 



560/is 



(6) 



and is discussed in the readout speed subsection 
next. In general deep storage is needed for trigger 
latency buffering and a far smaller window of in- 
terest need only be read out. 

It had been posited that the more extreme leak- 
age current values might correlate with the co- 
location of other logic or structures at the sampling 
array periphery. This conjecture is tested and re- 
jected in Fig. 14, where the measured leakage cur- 
rent for each cell is plotted by array location. No 
obvious pattern is seen, and the values are consis- 
tent with being completely random. 

4.5. Readout Speed 

As mentioned earlier, there is flexibility in choice 
of the resolution versus speed trade-off. If deter- 
mined to read out the entire array, the conversion 
cycle duration may be expressed as 
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Fig. 14. Array summary plot of the leakage current for 
all 64k pixels, where vertical is row number, horizontal is 
sample number and color code is in units of femto- Amperes. 
No pattern is observed in the layout. 

where T sw i tc h is the fixed latency (typically 
50ns) associated with resetting the voltage 
ramp/changing addresses and T conv is the interval 
required for the conversion to n-bits, given by the 
expression 



T conv = 2" • (1 ns) 



(8) 



for the 500MHz, dual-phase clock reference used 
in our measurements. 
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Fig. 15. Time required to read out the entire BLAB1 ASIC 
as a function of the number of bits of resolution. 

As mentioned previously, full chip readout is a 
rather extreme case. For a detector of the size of 
a typical high energy physics experiment, for "fast 
timing" signals, something like 100ns is the largest 
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window required. Even for a multi-km scale radio 
neutrino detector, the aperture of interest would 
still only be in the fis range, corresponding to less 
than 10% of the array, and for which the readout 
latency would be less than a millisecond. For a 
100Hz radio trigger, or a 30kHz collision trigger, 
the deadtime is negligible for pipelined operation. 

4.6. Power Dissipation 

During sampling, the power dissipation can be 
as low as 

P = I • AV = ^ • (2.5V) ~ 15mW (9) 
at 

where 5Q is the inverter transition charge and 5t 
is 86ps at the nominal 5.8GSa/s sampling. During 
sampling all of the other biases may be disabled. 

Quite unexpectedly, it was observed that lower- 
ing AV in the delay chain (running more slowly) 
dissipated more power, opposite of what the ex- 
pression above would indicate. Below 2V, sig- 
nificantly more power was drawn. Returning to 
SPICE, it was found that indeed as the ROVDD 
is lowered, the leakage current of the inverters 
becomes important. In particular because of the 
decision to give each storage cell its own inverter 
pair. That ~ 6 x 10 4 multiplier proved to be a 
huge factor and precluded sustained low-speed 
sampling due to enormous power dissipation. Data 
and simulation agree qualitatively, though at large 
current draws it is likely the voltage drop in the 
finite resistance of the die power wiring becomes 
important (and ignored in simulation). 

During readout, the current surges and the die 
subsequently heats substantially due to an over- 
sight in the original design. While the comparator 
bias currents can be shut down during sampling, 
when conversion is required, they must all be op- 
erated. Again a large multiplier (64k comparators) 
applies, and even a 10/iA comparator bias leads to 
a 0.65A surge. This is addressed in future designs. 

4.7. Concurrent Operation 

A key feature of the BLAB1 architecture is the 
ability to operate in a multi-hit buffer mode, to 
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Fig. 16. Simulated versus measured current draw of the 
entire array of voltage-controlled delay inverters. 

effectively reduce the deadtime to negligible levels. 
Concurrent readout while continuing to sample can 
have a deleterious impact on the quality of storage 
samples. Therefore we have performed a noise scan 
where the delay time of storage in Row 2 (adjacent 
row) is varied while Row 1 recording continues. The 
result appears in Fig. 17, where a small amount of 
cross-talk is observed right about the comparator 
transition time for Row 1. The effect is tiny (~1 
mV) and can be neglected. 



Row 1 




Fig. 17. Observed noise in storage Channel 2 when simul- 
taneous readout is performed in Channel 1. 

As an example of the potential benefit, for a fu- 
ture 16 channel BLAB2 ASIC, where a 32ns win- 
dow (320 samples at lOGSa/s) is recorded from 
each channel upon receipt of a Level 1 trigger, the 
net conversion time to 10 bits is roughly 160/is if all 
these samples are read out. However, with an ex- 
pected hit occupancy in the window for each ASIC 
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(monitored by trigger out signal) of about 3.2%, 
the mean latency for readout is 5.12//S. For a 30kHz 
maximum trigger rate, this is a 15.4% deadtime, 
though with large fluctuations. Having an 8 deep 
hold buffer for each channel (100ns wide) , the prob- 
ability of an overflow becomes a negligible 5 x 10 -8 . 



5. Precision Timing Performance 

Recent developments in high-density, high pre- 
cision timing photodetectors are finding appli- 
cations in Cherenkov detection techniques for 
particle identification, as well as medical imaging 
applications. To fully exploit the potential of these 
devices, robust performance, fine resolution tim- 
ing and highly integrated readout electronics are 
needed. Over the decades a number of electron- 
ics techniques have been explored to maximize 
the timing performance of photodetector signals. 
These include Constant Fraction Discrimination, 
multi-level thresholding, charge integration for 
threshold timewalk correction, among a long list 
too extensive to adequately summarize here. 

However, all of these techniques suffer from a 
number of practical limitations in actual applica- 
tion, which has served to degrade the realized per- 
formance. In the end, one simply cannot do bet- 
ter than having a high-fidelity "oscilloscope on a 
chip" for every sensor channel. Cost and data vol- 
ume precluded this type of waveform recording un- 
til recent generations of SCA ASICs [5,6,7] demon- 
strated such techniques were practical, especially 
for large systems. 

We present here some preliminary results of tim- 
ing resolution tests with this BLAB1 ASIC. As 
these devices are distributed to interested users 
around the world, and more clever algorithms for 
improved timing performance are considered, fur- 
ther improvements on already promising results 
may be obtained. 

5.1. Calibration 

In order to address bin-by-bin timing width 
differences, a couple of different calibration tech- 
niques have been tried. The first utilizes a sine 



wave zero-crossing technique used for calibrating 
the LAB3 ASIC [5]. That technique works best 
when the frequency of the sine wave is such that 
the measured interval between zero crossings can 
be uniquely assigned to a limited number of bins 
between successive crossings. 




Fig. 18. Residual bin-by-bin sample timing aperture devi- 
ations from a nominal bin width using the histogram oc- 
cupancy technique described in the text. 

Due to intrinsic curvature limitations, this tech- 
nique has an irreducible systematic error that is 
a function of sample rate. A more successful tech- 
nique is to histogram the zero crossings of a sine 
wave and use the bin occupancy to derive the ef- 
fective aperture width, the residual distribution is 
shown in Fig. 18. The most striking aspect of this 
distribution is the linear slope across the array. Ap- 
plying only this linear slope correction leads to the 
15ps RMS jitter in the determination of zero cross- 
ings for a subsequent sine wave data set, as seen 
in the inset distribution in Fig. 19. Applying a full 
bin-by-bin correction improves the distribution to 
lips RMS, with about an 8ps core. 

5.2. Bench Test Signals 

Timing performance was then evaluated using 
a pair of pulses separated by approximately 30ns. 
As seen in Fig. 20, over this longer timebase sepa- 
ration, a differential error of 27ps is obtained. The 
contribution of each edge then is then estimated as 
27ps/V% or 20ps per recorded edge. 

For complex curvature along the leading edge of 
the signal, the timing resolution obtained is seen to 
be rather sensitive to the method choosen to char- 
acterize the signal "hit" time. Unless the photode- 
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Fig. 19. Results for extracting the zero-crossing timing of 
a 400MHz sine wave after the application of the histogram 
occupancy timing corrections. Inset is the result for a sim- 
ple linear (slope) correction, and the main plot after a 
bin-by-bin correction. 
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Fig. 20. Timing resolution for a pair of pulses separated 
by approximately 30ns. Each edge can be inferred to be 
extracted a factor of \[2 better. 

tector signal is for a single p.e. quanta, the actual 
shape can be rather complex and dependent upon 
photon arrival statistics. Even in this simple case, 
noise and aperture systematics upon the leading 
edge can be important and can also be reduced 
by using multiple samples to fit to an analytic sig- 
nal shape. In general, the estimate error can im- 
prove as something like 1 / \ /r N for N samples along 



the leading edge. This is perhaps the most pow- 
erful aspect of having the full waveform samples 
to fit. Individual sampling errors can be averaged 
out. Examples are provided in the following sub- 
section, where it is clear that at the sampling rates 
being studied, this waveform recording technique 
logs many samples on the leading edge, which can 
be used to improve the signal timing extraction. 



5.3. PMT signal observation 

A convenient feature of the BLAB1 ASIC is that 
a PMT output transmitted over a 50Q coaxial ca- 
ble can be directly connected to the BLAB1 input, 
as per the diagram of Fig. 1. Two example pho- 
todetector outputs, intended for fast-timing appli- 
cations, are recorded in Fig. 21. 
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Fig. 21. Example waveforms recorded with a Hamamatsu 
R6680 fine- mesh PMT (top) and Burle 85011 Micro-Chan- 
nel Plate PMT (bottom). 
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Both photodetectors specialize in fine time res- 
olution and a direct comparison is informative. In 
the upper figure, the observed signal is an aggra- 
gate of a number of scintillation photons collected 
from a bar scintillator described in the next subsec- 
tion. At bottom is the risetime of Micro- Channel 
Plate photodetector (MPC-PMT), intended for 
precise single photon detection. For future sub- 
lOps devices, the transit-time spread in the single 
p.e. amplification process may limit the ultimate 
resolution. 

Finally affordable fast electronics may be able, 
on a channel- by-channel basis, to measure system- 
atic variations and provide the requisite compen- 
sating corrections to achieve the penultimate res- 
olution. 



A sample of a few thousand cosmic ray muons 
were recorded using the test configuration shown 
in Fig. 22. PMT signals from both ends of the Bi- 
cron BC408 plastic scintillator bar are recorded. 
The bar is 4cm thick, 255cm long and viewed by 
Hamamatsu R6680 fine-mesh PMTs at each end. 
The Cherenkov trigger telescope counters con- 
sists of lucite slabs (approx. 5cm x 6cm x 3.5cm), 
also viewed by prototype R6680 fine- mesh PMTs. 
To estimate expected system performance, we 
recorded the trigger counters and extract an intrin- 
sic error on determination of the trigger time by 
comparing the observed time difference in the two 
trigger counters. This jitter, as shown in Fig. 23, is 
quite large and should be improved in the future. 



5.4. Belle TOF Counter 

In order to evaluate the waveform sampler per- 
formance with a realistic set of pulses, we use cos- 
mic muons incident on a spare TOF counter of the 
Belle detector [15]. The test set-up is illustrated in 
Fig. 22, and is located in the University of Hawaii 
Instrumentation Development Laboratory. 
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Fig. 22. Schematic of the cosmic test system with a Belle 
TOF counter and trigger counters. 



Entries 
Mean 
RMS 

X 2 1 ndf 

Constant_Narrow 
Mean Narrow 
Sigma_Narrow 
Constant_Wide 
Mean Wide 
Sigma_Wide 




2557 
4.52 
0.6523 
31.29/25 
195.8120.0 
4.51810.021 
0.340710.0296 
143.8118.5 
4.5210.02 
0.761510.0268 



Fig. 23. Timing results obtained for the trigger counter 
time difference. 



The contribution of the narrow gaussian can be 
subtracted in quadrature from the time difference 
observed at the ends of the Belle TOF counter, the 
distribution of which is shown in Fig. 24. 

Doing this common mode subtraction leads to a 
resolution of about 190ps per PMT end. Compar- 
ing this observed signal resolution with a detailed 
Monte Carlo study [16], we can see that these val- 
ues are comparable to the 150ps (170ps) or so for 
single end times from MC (data). 
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Fig. 24. Timing difference results from fits to the PMT 
signal waveforms at each end of the Belle TOF counter. 

6. Future Directions 

While the analog bandwidth of the BLAB1 is 
adequate for many RF recording applications, a 
higher bandwidth device will be explored, based 
upon the lessons learned from this first device. In 
particular, the tree structure and design of the ana- 
log amplifier tree is being scrutinized and improved 
in simulation. It is hoped that an almost arbitrar- 
ily large storage depth can be accommodated up to 
1GHz of analog bandwidth through a careful lay- 
out of the buffer amplifier cascade array. In future 
devices, it is possible to significantly improve the 
number of storage cells. A specific example of the 
Particle Identification (PID) readout ASIC for the 
Belle upgrade is shown in Fig. 25. 

In the upper plot a 4/xs storage depth is assumed, 
or 40,000 storage cells at lOGSa/s. In the lower 
plot, four separate curves indicate the number of 
input channels and their subsequent depth versus 
array linear distance (assumed to be square). A die 
larger than 1cm per side was not considered for 
yield reasons. Also, pin constraints, particularly on 
making the output parallel to reduce readout la- 
tency, probably limit the practical number of input 
channels to 16. It is noted that many photodetec- 
tors operate at gains requiring additional amplifi- 
cation in order to provide a signal with sufficient 
amplitude for either triggering or recording. Inte- 
gration of transimpedance and other input ampli- 
fier topologies are being studied and results from 
future devices that use such on-chip, high analog 
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Fig. 25. Packing density estimates for a future Belle up- 
grade Particle Identification readout ASIC. At top is num- 
ber of channels versus linear dimension of the (square) stor- 
age array, at bottom, the number of fis of storage versus 
channel count also at lOGSa/s. Pin limitations will likely 
limit the practical number to 16 channels. 

bandwidth elements will be reported later. 

There is a misconception that waveform sam- 
pling is significantly more expensive than tradi- 
tional discriminator + TDC methods. Certainly 
when packaged as a full oscilloscope and sold as 
a commercial unit, with large buffer depth, this 
can be true. In Fig. 26 is listed the fabricated and 
quoted prices during the 2007 fiscal year in the 
same TSMC 0.25/im process. It is interesting to 
note that the slope of the first 3 devices correspond 
to Multi-Project Wafer runs, and the latter 3 are 
dedicated wafer runs. Packaging is not included 
and is a minimum of about $l/die in high volume. 

A summary of active ASIC designs inspired by 
the performance of the BLAB1 ASIC may be found 
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Economy of Scale for Quoted ASICs 
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Fig. 26. Channel cost scaling for a reference waveform sam- 
pler ASICs based upon recent experience. 

in Table 2. 
Table 2 

Future BLAB1 inspired ASIC designs. 



ASIC 


# 


Samples 


Rate 


BW 


power 


Acronym 


Chan 


per Chan 


[GSa/s] 


[GHz] 


m W/ chan 


BLAB2 


16 


2k 


2-10 


> 1 


< 20 


TARGET 


16 


4k 


0.5-1 


0.5 


< 20 


RAL64 


64 


512 


DC - 5 


& 0.3 


< io 


APTD 


4 


8k 


DC - 0.5 


0.2 


< 1 


BIRD 


1 


256k 


1 


0.5 


< 30 



A number of these designs are reaching maturity 
and two have already been submitted for fabrica- 
tion. Details of the designs and results from opera- 
tion of these devices will be reported in the future. 



7. Summary 

A first generation of deep-storage Switched 
Capacitor Array (SCA) CMOS device has been 
studied in a 0.25/im process. This architecture is 
optimized for concurrent acquisition and readout, 
permitting deadtimeless operation. Demonstrated 
low-power, high-resolution and exquisite timing 
performance make this device and subsequent 
variants attractive for readout of a broad range of 
particle and astroparticle detectors. 
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These devices find application niches for the fol- 
lowing reasons: 

- Timing Performance — > BLAB2 is intended for 
sub-lOps photodetector pulse time recording 

- Low Cost — > TARGET is intended for the low- 
cost instrumentation of 1M photodetector chan- 
nels of a future TeV 7 telescope 

- High Density — » RAL64 is a dense array read- 
out device, where 128 channels or more could be 
considered in the future 

- Low Power — > APTD is a demonstrator low- 
power ADC device for a proposed Advanced Pair 
Telescope satellite 

- Extended Depth — » BIRD is a very deep stor- 
age ASIC for the future IceRay extended radio 
neutrino detector at the South Pole 
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